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ABSTRACT 

Aims. Revealing the nature of unassociated high-energy (^ 100 MeV) y-ray sources remains a challenge 35 years after their discov- 
ery. Of the 934 y-ray sources at high Galactic latitude (1^1 > 15°) in the First Fermi-hAT catalogue (IFGL), 316 have no obvious 
associations at other wavelengths. We present an improved method that automatically ranks counterparts based on their similarity 
with identified y-ray sources. 

Methods. In this paper, we apply the A'-means unsupervised classification algorithm to isolate potential counterparts for 18 unassoci- 
ated Fermi sources contained within a ~ 3000 deg"- 'overlap region' of the sky intensively covered in radio and optical wavelengths. 
Results. Combining our results with previous works, we reach potential associations for 1 19 out of the 128 Fermi sources within said 
region. If these associations are correct, we estimate that less than 20% of all remaining unassociated IFGL sources at high Galactic 
latitude {\b\ > 15°) might host 'exotic' counterparts distinct from known classes of y-ray emitters. Potentially even these outliers could 
be explained by high-redshift/dust-obscured analogues of the associated sample or by intrinsically faint radio objects. 
Conclusions. Although our estimate of exotic sources leaves some room for novel discoveries, it severely restricts the possibility of 
detecting dark matter subhaloes and other unconventional types of y-ray emitters in the IFGL. In closing, we argue that the identifi- 
cation of Fermi sources at the low end of the flux density distribution will be a complex process that might only be achieved through 
a clever combination of refined classification algorithms, multi-wavelength efforts, and dedicated optical spectroscopy. 

Key words. Gamma rays: general - Cosmology: dark matter - Catalogs - Methods: statistical 



1. Introduction 

The nature of high-energy (> 100 MeV) y-ray sources lack- 
ing counterparts at other wavelengths remains an enigma 
decades after their discovery. Early work on individual unas- 
soci ated y-ray sources dates back to days of the Cos B satel- 
lite dJulien & Hebnkeiil 1197 8). The mystery intensified with 
the discovery of 271 y-ray sources by the EGR ET instrument 
aboar d the Compton Gamma-Ray Observatory (iHartman et alJ 
119991) . To date, about 130 of these EGRET sources remain 
unassociated with about half located at high Galactic lati- 



tude (iMukheriee & Halpern"2004' 'Thompson 2008"). The latest 
source count in the 100 MeV to 100 GeV energy range produced 
by the Large Area Telescope (LAT) instrument on board the 
Fermi Gamma-ray Space Telescope has expanded the number 
of persistent high- energy y-ray sourc es at high Galactic latitude 
(1^1 > 15°) to 934 ( Abdo et al. 2010a). But despite its superb an- 
gular and energy resolution relative to EGRET, only 6 1 8 of these 

' sources are confidently associated. This leaves 316 unassociated 

. Fermi sources at \b\ > 15°. 

While multi-wavelength strategies have evolved consider- 
ably, the difficulties inherent to the identification process of y- 
ray sources (i.e. arcminute-scale eiTor regions) continue to afford 
some room for speculat i ng abo ut the nature of the unassociated 
population. i Montmerlg ( 1197 9^ estimated that a combination of 
supernova remnants, OB groups, and even H II regions could 
account for a number of the as-yet unassociat ed y-ray source s 
(see also the ground-breaking work done by IMorrisonlll958l) . 
Off the Galactic plane, there are statistical hints of y-ray emis - 
sion from stacked galaxy clusters ( Scharf & Mukh erie 



ay emis- 
g i200l . 



Recently, a number of authors have pointed out that even rarer y- 
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ray emitters including subhaloes left behind during structure for- 
matio n could be detected by F ermi (Fieri, Bertone & Branchini] 
,200i iKuhlen. Madau & SiUH 12009; .Buckley & Hooper. .20 loir 
Throughout the text, we shall use the term 'exotic' to refer to 
any type of y-ray source that is clearly distinct from the associa- 
tions described in lAbdo et al. (2010a). 

Typically, the procedure used to generate y-ray source as- 
sociations relies on the positional coincidences between Fermi 
sources and catalog ues of plausib le y-ray e mitters categorised a t 
other wavelengths (iReimer & To rres 2007; I Abdo et all 1201 Pal) . 
A lengthier approach to source association involves the brute- 
force search for a counterpart using deep radio. X-ray, and op- 
tical observations, together with spectroscopic classification of 
notable o bjects within its y-ray error region (Mirabal et al. 2000t 
Mukheri ee et al.ll2000h . If no plausible association can be pro- 
duced through either method, then the source would fall within 
our definition of 'exotic' as it must be lacking one or more of 
the fundamental attributes of the known classes of y-ray emit- 
ters. Regardless of the association method, a firm physical link 
between a given y-ray source and a counterpart in another wave- 
length can only be established through contemporaneous tem- 
poral variability, similar spatial morphology, or equivalent pul- 
sation. However, only a small fraction of y-ray sources meets 
any of these criteria. As a result, the large majority of cuiTent 
associations are only probabilistic. 

Apart from the shortcomings of current classification meth- 
ods, our ability to associate y-ray sources rests on the qual - 
ity of the catalogues used for that purpose (lAbdo et al.ll2010ah . 
Generally, most astronomical databases used for source associa- 
tion have been constructed from disjointed surveys with limited 
spatial coverage/flux limits. Naturally, the association process 
reflects such inhomogeneities. For the casual reader, this means 
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that some areas of the sky are much better covered than others. 
In other words, the overall likelihood for finding a source asso- 
ciation on positional coincidence alone is not uniform across the 
Fermi sky. 

With few options left for improving our y-ray focusing capa- 
bilities and catalogue coverage in the short term, the best hopes 
for identifying the rest of unassociated y-ray sources may lie in 
a further refinement and application of classification algorithms. 
A possible way forward is to take advantage of proven classifi- 
cation tools borrowed from the data mining and machine learn- 
ing communities to improve searches of individual Fermi LAT 
source fields. A number of such algorithms has been applied 
in an assortment of astrophysical applications. Ball 



)een a pplied 
etal.l(l2006h 



used decision trees to provide star/galaxy classification for the 
entire Sloan Digital Sky Survey (SDSS) data release. Applied 
agglomerative hierarchichal clustering and A'-means clustering 
have also bee n used for spectral classification in X-rays (see 
iHoinacki et al...2006 , and references therein). 

Here we describe an application of A'-means clustering as 
a classifier of objects inside the error regions of unassociated 
Fermi objects and the subsequent estimate of the 'exotic' frac- 
tion among the catalogued Fermi population. Given the extreme 
dust extinction and source crowding close to the Galactic plane, 
we restrict our analysis to high Galactic latitude. Further, and in 
order to maximise the likelihood of association, we turn to one 
of the most intensively studied areas of the s ky away from the 
Galactic plane, namely the 'overlap region' dKimball & Ivezid 
I2OO8.) . This ~ 3000 deg^ area is defined by the overlap of four 
radio catalogues: Green Bank 6 cm Survey (GB6), Faint Images 
of the Radio Sky at Twenty Centimeters (FIRST), NRAO-VLA 
Sky Survey (NVSS), and the Westerbork Northern Sky Survey 
(WENSS), as well as photometry and spectroscopy collected by 
the SDSS. 

The structure of the paper is as follows: §2 summarises the 
data selection, §3 describes the /T-means classification algo- 
rithm. §4 details the results of the classification process. §5 dis- 
cusses spectral typing. Discussion and conclusions are presented 
in §6 and §7. 

2. Data description 

The 'overlap region' is a ~ 3000 deg^ strip around the North 
Galactic Cap extending between 7.6'' < R.A. < 17.8'', -h28.8° < 
decl. < -1-63.2° where the FIRST (20 cm), NVSS (20 cm), 
WENSS (92 cm), GB6 (6 cm) radio surveys, and the SDSS op- 
tical survey coincide. This region is well outside of the Galactic 
plane {\b\ ^ 25°) and provides an ideal location to identify high- 
latitude y-ray sources. A full description of the limits of each 
survey as well as the respec t ive res trictions in sky coverage are 
defined in lKimball & Ivezid (l2008h . 

The First FermZ-LAT catalogue (IFGL) consists of 1451 
sources characteri sed in the 100 MeV - 100 GeV energy range 
(lAbdo et al.l l2010a). Wit hin the 'over lap region', we have iden- 
tified 128 IFGL sources jAbdo et al.lfeOlOa) that ai-e simultane- 
ously covered by all four radio surveys and the SDSS photomet- 
ric survey. A total of 110 sources are paired with plausible asso- 
ciations in the IFGL (Abdo et al. 2010a) or the First LAT AGN 
Catalogue (ILAC. lAbdo et al.l l2010b). The remaining 18 corre- 
spond to sources designated as unassociated. Figure[T]shows the 
distribution of these sources and a general footprint of the 'over- 
lap region' . 

The sample of associated sources comprises 61 BL Lacertae 
objects (BL Lacs), 41 flat-spectrum radio quasars (FSRQs), and 
8 objects classified as active galactic nuclei (AGN) of rare or 




Fig. 1. Fermi LAT all-sky map for energies > 10 GeV in equato- 
rial projectio n. Circles indicate associated sources. Large di- 
amonds mark unassociated Fermi sources. The marker sizes 
have been greatly exaggerated for easier visualisation. The foot- 
print of the 'overlap region' is outlined by the locations of 
Fermi sources. The continuous strip of y-ray emission tracks the 
Galactic plane. 



unknown type. For practical purposes, we treat these 110 asso- 
ciated sources as the main source of training and testing sets for 
the A'-means algorithm that will be described in greater detail 
later. 

Since it is not our intention to reinvent matching al- 
gorithms for radio surveys , we used the procedure intro- 
duced by iKimball & Ivezid (l2008h to ensure physically-real 
matches. After collecting the associated sources, we searched 
for their respective radio counterpart in the FIRST catalogue 
(iBecker. White & Helfandl[T9 95). We then positionally matched 
the FIRST location to its closest WENSS detection using a 30" 
matching radius around the FIRST position. Next, we matched 
the FIRST detection to GB6 using a radius of 70" search radius. 
Once the matches were completed, the actual spectral index a for 
each radio counterpart detected in at least two frequency bands 
was calculated as Sy '^ v". 

One of the concerns associated with matching algorithms ap- 
plied to radio catalogues with widely different angular resolu- 
tions is the possibility that the sample might be contaminated 
by coi ncidental physically-unrelated sources. Kimball & Ivezig 
(l2008h have used the distribution of angular distances to the 
nearest neighbour source to find the fraction of matches from 
the FIRST, WENSS, and GB6 which are physically real. For the 
samples used here, the estimated efficiencies are FIRST- WENSS 
(> 92%), FIRST-GB6 (> 79%), and FIRST-SDSS (> 95%) re- 
spectively. Thus, we can safely assume that at least > 79% of all 
the radio sources are properly matched. 

Internally, we further v alidate the reported e fficiency of 
the matching procedure of Kimb all & Ivezid (l2008h within our 
sample. All of the 110 associated Fermi sources have a 1.4 
GHz FIRST counterpart brighter than 2.5 mJy, 95% are de- 
tected by WENSS to a limiting flux of 18 mJy, and roughly 
93% show a GB6 source brighter than 18 mJy. The disappear- 
ance of WENSS and GB6 counterparts occurs predominantly 
for BL Lac associations at the faint end of the FIRST radio 
density distribution (5 1.4 $ 31 mJy). But for the most part, 
the radio regime excels at capturing the non-thermal emission 
fro m y-ray sou rces (see also Kovalev 2009; Giroletti et al. 2010t 
Mah onvetal .1 2010: Ghirlan daetal.i r2010). Figure |2] shows the 
distribution of FIRST radio fluxes for the associated Fermi 
sources. 



3. Classification algoritlim 

A'-means is a multivariate, iterative method that automatically 
finds K 'natural' clusters in a specific dataset. In its simplest 
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Fig. 2. Fermi y-ray flux density vs. 1.4 GHz FIRST flux density 
{S 1.4) within the 'overlap region'. Small dots (red) represent as- 
sociated Fermi sources. Black squares indicate new associations 
produced by the /T-means classification algorithm. Arrows indi- 
cate radio upper limits for unassociated Fermi sources. The line 
follows an indicative fit fy oc 5 "4^. 



form, each object is assigned to the cluster with the nearest clus- 
ter centroid (MacOueen 1967; Hoina cki et al. 2006 ). The par- 
tition process is repeated automatically until convergence has 
been reached i.e. no object reassignments are performed to a dif- 
ferent cluster. Since our aim is to locate potential associations 
within the sample of unassociated Fermi sources, we consid- 
ered three input variables: the radio spectral index agi between 
WENSS (326 MHz) and FIRST (1.4 GHz), the radio spectral in- 
dex ae between FIRST (1.4 GHz) and GB6 (4.85 GHz), and the 
y-ray photon spectral ind ex T derived in the 100 MeV-100 GeV 
range (lAbdoetalJl2010ah . 

We know a priori that there are two 'natural' clusters in the 
associated sample: BL Lacs with an average photon index of 
r ^ 2.18 + 0.02 an d FSRQs with an average f = 2.48 + 0.02 
dAbdo et al]|2010dr) . Thus, the A'-means algorithm was initially 
performed on the associated Fermi sources assuming /T = 2 as 
an input parameter. Figures[3]and|4]show the final separation into 
two distinct clusters. The unsupervised algorithm does a superb 
job separating spectroscopically distinct FSRQs (F ~ 2.48) from 
BL Lacs (F ~ 2.01). However, there is a clear conflict region at 
the boundary of the clusters that results in ~ 30% false positives 
for spectroscopic BL Lacs and ~ 33% false positives for spectro- 
scopic FSRQs. While the classification is not perfect, the results 
demonstrate the effectiveness of unsupervised classification al- 
gorithms in this context. 

To aid in the identification of possible counterparts within 
the error regions of unassociated Fermi sources, we need to de- 
fine a locus that can help us recognise the position of possible 
associations in F-o- space. We accomplish this by splitting the 
associated sample into 100 random training (70% of the total) 
and testing (30% of the total) sets. For each individual testing 
set, we performed the /T-means algorithm to automatically find 
the cluster centroids of that particular set. Subsequently, its as- 
sociated testing set allowed us to quantify the performance of 
the algorithm by counting the number of false positives as a 
function of threshold around the centroid. Additional refinement 
was achieved with template radio spectral indices culled from 
possible contaminants expected in Fermi source fi elds including 
quasi -stellar objects (QSOs) and radio galaxies dMirabal et al.l 
l2Q0nh . 
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Fig. 3. Fermi photon index F vs. spectral index a(, between 1.4 
and 4.85 GHz for associated Fermi sources. Blue (filled circles) 
and red (open circles) markers represent two distinct clusters au- 
tomatically identified by ^-means. The X symbols mark the cen- 
troid of each cluster. 
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Fig. 4, Fermi photon index F vs. spectral index aga between 326 
MHz and 1.4 GHz for associated Fermi sources. Blue (filled cir- 
cles) and red (open circles) markers represent two clusters auto- 
matically identified by /T-means. The X symbols mark the cen- 
troid of each cluster 



Generally, we found that the /T-means algorithm is a pow- 
erful Fermi classifier using a locus with a narrow dispersion 
(~ 1.5cr) around the centroids of the distribution. Larger loci 
start to capture outliers that might include radio AGN outbursts 
with steep radio spectral indices \a\ > 0.5. At larger distances 
from the centroids, we also start to see a degeneracy with ra- 
dio galaxies and certain types of non-blazar AGNs in the T-a 
space. By picking a restrictive locus with a small dispersion, K- 
means only classifies the core of the distribution (~ 75% of the 
objects) while missing or filtering the outliers of a particular data 
set. However, in return, the algorithm optimises the classification 
process by reducing the number of false positives (< 20%). 



4. Results 

Once the best locus parameters were determined, we applied the 
^-means algorithm to objects found inside the error regions of 
each of the 18 unassociated Fermi sources. In order to gather 
a full census of possible candidates, we consider all the radio 
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objects within the 99.7% confidence location contour for each 
source. The 99.7% confidence level radius rgg 7 is related to the 
95% confidence level radius rgj derived by Abdo et al. (2010a) 
through r99.7 = 1.39 x rgs. 

Taking radio emission as a proxy of y-ray emission, we start 
with the positions of all FIRST radio detections within each rgg 7 
error region. We then proceed to match each individual FIRST 
location with the GB6 and WENSS catalogues as detailed in §2. 
Finally, we derive cce and a92 radio spectral indices for any object 
detected in at least two radio frequency bands. The process is 
very fluid as radio sources are typically sparse. On average there 
are 90 sources per deg^ down to the FIRST flux limit and even 
fewer in the GB6 and WENSS. 

Next, each object with a measured spectral index is com- 
pared to the locus found using the associated sample. This is a 
two-step procedure that initially ranks the radio objects within 
the Fermi error region according to their Euclidean distance to 
the nearest cluster centroid (objects with the smallest distance 
are ranked at the top). It then flags each object either as a possible 
association (if a radio candidate lies within the locus) or unasso- 
ciated (outside the locus). In total, the automated /T-means algo- 
rithm returned possible counterparts for 8 out of 18 unassociated 
Fermi sources in our sample. Table[T]lists the 8 sources with their 
respective AGN association. In the case of IFGL J0942.1H-4313, 
we were not able to perform the /T-means algorithm as none of 
the FIRST sources within its 99.7% error contour had an equiva- 
lent WENSS or GB6 counterpart. Unassociated sources without 
any apparent counterparts are summarised in Table |2] (see ap- 
pendix for further details). 

5. Spectral typing 

Optical spectroscopy is arguably the most powerful tool to 
accomplish the classificat ion of objects within a particu- 
lar y -ray error contour (Mirab al et al.l l2000t iMukheriee et alJ 
I2OOO1) . However, manually sorting through dozens of spectra is 
painstaking and difficult. The A'-means algorithm eases the iden- 
tification process by ranking all the objects within the error con- 
tour 

We take advantage of the /T-means results to search for pub- 
licly available optical spectra at the position of each new asso- 
ciation listed in Table [1] In addition to deep optical photome- 
try, the SDSS is carrying out an impressive SDSS Spectroscopic 
Survey that will eventually obtain calibrated spectra for about 
one million objects with a spectral wavelength range 3800- 
9200 A and a resolution of 1800 dStoughton et alJl2002h . After 
sifting through this massive sample, we find that 4 out the 
8 associated sources have corresponding SDSS spectra. Two 
(CRATES J101811-H354229 and 3C 345) are spectroscopically 
classified as FSRQs at z = 1.228 and z - 0.588 respectively. 
Two additional sources (FIRST J113812.1-t-411353 and IRXS 
J125716. 0+364713) look like good BL Lac candidates without 
obvious redshifts. 

For completeness, we also combed the SDSS Spectroscopic 
survey for spectra at the positions of all FIRST sources contained 
within the 99.7% error regions for the remaining 10 unassoci- 
ated Fermi sources. Most of the matching spectra are either run- 
of-the-mill QSOs or galaxies. However, we have localised an 
additional BL Lac associated with FIRST J124946.7+370748 
inside the error contour of IFGL J1249. 8+3706. Accordingly, 
we add the latter to the associated column. We note that FIRST 
J124946. 7+370748 failed to be associated by the /iT-means al- 
gorithm as it is only detected in a single frequency with a 1.4 
GHz FIRST flux density of 5.75 mJy. SDSS optical spectra of 



the associated objects are shown in Fig. |5] Additional notes on 
individual objects are given in the appendix. 



6. Discussion 

We have successfully applied the /iT-means classification al- 
gorithm to 18 unassociated Fermi sources within the 'over- 
lap region'. The algorithm trained on associated sources en- 
ables the potential classification of 8 new Fermi sources. Adding 
these to one additional source spectroscopically associated with 
IFGL J1249. 8+3706 reduces the number of IFGL unassociated 
sources in the 'overlap region' from 18 to 9. Proper accounting 
indicates that 119 out of 128 Fermi sources (93% of all Fermi 
sources) are associated within said region. In contrast, outside 
the 'overlap region', only 508 out of 806 (~ 63%) Fermi sources 
at 1^1 > 15° have been associated. Assuming that the IFGL sky 
coverage is nearly uniform outside the Galactic plane {\b\ > 10°), 
where diffuse y-ray emission is less prominent, we find that the 
percentage of unassociated Fermi sources in the 'overlap region' 
suggests that < 20% of all remaining unassociated IFGL sources 
at 1^1 > 15° might host new types of y-ray sources. As noted 
earlier, such discrepancy is partially due to better coverage and 
more complete catalogues in the northern sample. 

However, we are faced with a clear puzzle: What kind of 
counterparts are hiding among the 9 'exotic' outliers?. To ex- 
plore this question we plot in Figure[2]the Fermi y-ray flux den- 
sity from dAbdoetallboiOah versus the 1.4 GHz FIRST flux 
density for the 9 unassociated Fermi sources. In the same plot, 
we include the complete sample of associated Fermi sources 
within the 'overlap region'. For the unassociated sources, we as- 
sign a radio upper limit from the flux density of the brightest 
FIRST radio source within the error region that is not spectro- 
scopically classified either as a galaxy or QSO (see Table|2|. 

While there is significant scatter between the two quan- 
tities, we notice that in general brighter y-ray sources tend 
to be brighter in radio. A simple fit fy oc ^''^^ has been 
drawn through the points to show an indicative general trend. 
Interes tingly, our best-fit slop e at 1 .4 GHz is identical to the re- 
sult of iGhirlanda et alj (12010 ) based on an entirely different set 
of associated Fermi sources acquired at 20 GHz. With the no- 
table exception of IFGL J1527. 6+4152 at 77 mJy, the Fermi 
gamma-ray flux density and radio upper limits of the 9 unas- 
sociated sources tend to lie at the faint end of the distributions 
(5 1.4 ^ 27 mJy). The observed scatter of radio flux values ad- 
mits that the actual associations for these outliers could be high- 
redshift or dust-obscured analogues of sources akeady present 
in the IFGL. 

The latter reasoning is weakly reinforced by the /T-means 
association of IFGL J0753. 1+4649 (without an apparent opti- 
cal counterpart) and the fact that an important fraction of the 
remaining FIRST radio sources detected within the error re- 
gions of the 9 unassociated Fermi sources lack an SDSS opti- 
cal counterpart to a limit of r > 23.1. Alternatively, the out- 
liers could have intrinsically fainter radio counterparts. Either 
requirement could be met with the known population of y-ray 
emitters in the IFGL without invoking new types of sources. 
For instance, although rare, it is possible that at least one high- 
latitu de radio-quiet pulsar could be hiding within the 'overlap re- 



gion' ([Mirabal & Halpernl200HlHalpern et al.ll2002l:lAbdo et all 



I2009bh 



Mirabal, Nieto & Pardo: The exotic fraction among unassociated Fermi sources 



7. Conclusions 

It is important to emphasise once again that the associations 
reported here are not final identifications but rather statisti- 
cally significant matches. While the matching procedure for ra- 
dio sources is not perfect, the conclusions appear firm since 
there is little contamina tion from physically-unrelated sources 
dKimball & Ivezicll2008h . Ultimately, the goal of this paper is to 
introduce an algorithm that can facilitate the search for coun- 
terparts within Fermi error circles by ranking their similarity 
with previously well-identified Fermi counterparts. Empirically, 
this method is already implemented in manual searches but the 
algorithm presented here automates the process. The multifre- 
quency observer can now start to study a certain y-ray error 
circle with a possible order of priorities. Strictly speaking the 
detection of contemporaneous variability, pulsations, or spatial 
extent are the only paths to directly prove a physical connec- 
tion. Unfortunately, except for radio-loud pulsars and a fraction 
of the brightest Fermi AGN, a firm identification might have to 
wait for the capabilities of future very high-energy (> 100 GeV) 
y-ray arrays that could star t to probe correlated variab ility with 
more sensitive instruments ( jHinton & Hofmannl2009 !). 

Clearly, the association of Fermi sources at the lower end of 
the flux distribution looms as a pressing issue not only to com- 
plete the census of unassociated sources but also to help pinpoint 
objects responsible for the unresolved, ext ragalactic diffuse y - 
ray emission at energies above 200 MeV dAbdo et al.ll2010a) . 
While the absence of obvious counterparts lets us entertain the 
possibility that a handful of unconventional objects such as dark 
matter subhaloes are powering certain Fermi sources, the ap- 
proach presented here represents a first attempt to cap their ac- 
tual fraction. Yet, we caution that the pay-off for finding a dark 
matter signal among unassociated Fermi sources would be so 
critical for the progress of fundamental physics that it merits 
continued efforts to identify every single source in the IFGL. 

Possibly, the ideal association process for the outliers would 
involve collecting spectroscopy for the totality of sources within 
their error regions. Presumably, such coverage will be hard to ac- 
complish with current instrumentation, especially given the dif- 
ficulties at the low end of the flux distribution across bands. As 
shown here, classification algorithms could play a crucial role 
in isolating clusters of truly 'exotic' objects. However, the vari- 
ables and locus definition presented here should not be taken as 
universal models. In the future, further refinement of the algo- 
rithm will take place when additional coverage in various wave- 
lengths is completed and a final classification of all the sources is 
reached. Once final classifications are achieved, additional work 
shall explore alternative classification algorithms as well as an 
additional input variables. One foreseeable limitation for the K- 
means algorithm appears to be the flux depth of existing sur- 
veys, but that could be improved with dedicated observations. 
We close by noting that the estimated fraction could be further 
restricted with dedicated multi-wavelength efforts and pulsation 
searches in the 'overlap region'. 
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Appendix A: Notes on individual sources 

A.1. Associated 

IFGL J0753.1+4649.- Two FIRST sources are catalogued 
within the 99.7% error region. FIRST J075339.9-H464824 (S 1.4 
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Fig. 5. SDSS spectra of newly associated sources. 



Table 1. Fermi sources with associations 



IFGL Name 


Assoc. AGN 


5,.4(mJy) 


Spectral class 


z 


A'-means ass.?' 


Spectr.?- 


1FGLJ0753. 1+4649 


FIRST J075339.9+464824 


12.36 


- 


- 


Y 


N 




FIRST J075309.7+465 155 


12.83 


- 


- 


N 


N 


1FGLJ1016.2+3548 


CRATES J101811+354229 


582.58 


FSRQ 


1.228 


Y 


Y 


IFGL Jl 129.3+3757 


FIRST Jl 12903.2+375656 


23.43 


- 


- 


Y 


N 


IFGL Jl 138.0+4109 


FIRSTJ113812.1+411353 


22.44 


BL Lac? 


- 


Y 


Y 


1FGLJ1249.8+3706 


FIRST J124946.7+370748 


5.75 


BL Lac? 


- 


N 


Y 


1FGLJ1256.9+3650 


IRXS J125716.0+364713 


73.86 


BLLac 


- 


Y 


Y 




4C +36.22 


716.97 


Seyfert 1 


0.709 


N 


Y 


1FGLJ1323.1+2942 


4C +29.48 


1412.49^^ 


- 


- 


Y 


N 


1FGLJ1642.5+3947 


3C345 


6598.19 


FSRQ 


0.588 


Y 


Y 


1FGLJ1649.6+5241 


FIRST J 164924.9+5235 15 


44.30 


- 


- 


Y 


N 



"Y" indicates that the object has been associated by the A'-means algorithm. 
^ "Y" indicates that the object has been associated spectroscopically. 
^ This value represents the sum of the individual components of a multi-component radio source. 



Table 2. Fermi sources without associations 



IFGL Name 


Radio 


upper limit S1.4 


(mJy) 


E™. (GeV)' 


IFGL J0736.4+4053 




<0.82 




42.5 


1FGLJ0900.5+3410 




< 19.64 




29.6 


IFGL J0942. 1+43 13 




<3.38 




35.8 


IFGL J 1226.0+2954 




< 13.78 




10.0 


IFGL J15 15.5+5448 




< 26.89 




14.2 


1FGLJ1527.6+4152 




< 77.77 




21.5 


1FGLJ1553.9+4952 




<7.72 




265.4 


1FGLJ1627.6+3218 




< 14.68 




7.6 


1FGLJ1630.5+3735 




<6.09 




6.3 



Maximum observed energy of Fermi LAT photons within 99.7% Fermi error region as of 2010 May 3 1 UT. 



= 12.36 mJy) is picked automatically by .^-means. Oddly it lacks 
any apparent optical SDSS counterpart to a limit of r > 23.1. A 
possible 'dark horse' candidate is the other radio source FIRST 
J075309.7+465155 (5 1.4 = 12.83 mJy) with a measured SDSS 
optical counterpart at r = 19.86. The latter was not detected by 
WENSS or GB6 and hence it could not be evaluated by the K- 
means algorithm. Spectroscopy is needed to settle the issue. 
IFGL J1016.2+3548.- The /T-means algorithm selects 
CRATES J101811+354229 as an association. The SDSS spec- 
trum confirms this object as an FSRQ at z = 1 .228 (see Fig.|5l). 
This so u rce was first suggeste d as an affiliation by lAbdo et all 
(l2010bh . [Gregory et al.l (12001 *) designate this source as a likely 
radio variable at 4.85 GHz. FIRST J101505.6+360452 is most 
likely a QSO at z = 0.843. 

IFGL J1129.3-I-3757.- FIRST Jl 12903.2+375656 is selected 
as an association by A'-means. No optical spectrum is avail- 
able for this object. However, the derived SDSS optical colours 
match the colour-colour placement of high-co nfidence BL Lac 
candidates identified bv IPlotkin et all ( l2008l) . At the edge of 
the error region lies the intriguing B3 1127+380, a double- 



symmetric radio source with a steep radio spectral index. IRXS 
J112909. 8+380847, the brightest X-ray source within the error 
region, appears to be a coronal-emitting star. 

IFGL Jl 138.0+4109.- According to /T-means, FIRST 
J113812.1+411353is the most likely association. Its SDSS op- 
tical spectrum (Fig.|5]l lacks any spectral features, so we regard 
this object as a possible BL Lac. Thi s sourc e was suggested 
as a tentative BL Lac bv iPlotkin et al.1 (|2008|) . A bright X-ray 
source at the edge of the error region IRXS Jl 13857.0+410840 
coiTesponds to the F star HD 101207. 

IFGL J1249.8+3706.- FIRST J124946.7+370748 was selected 
spectroscopically as a possible BL Lac counterpart for the y-ray 
source. The optical SDSS spectrum shows no obvious emission 
or absorption lines (see Fig. |5]). In radio, it is only detected by 
FIRST with a flux density of 5.75 mJy. It was not included in 
the object list evaluated by the /T-means algorithm since it lacks 
WENSS and GB6 counterparts. This object could be represen- 
tative of possible associations at the faint end of the IFGL flux 
density distribution. 
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IFGL J1256.9+3650.- IRXS J125716.0+364713 was isolated 
by the ^-means algorithm. The SDSS spectrum confirms the ab- 
sence of spectral features, which is consistent with a possible 
BL Lac object (Fig.|5l). This is an other source listed as a tenta- 
tive affiliation in .Abdoet alj (12010b.) . Another intriguing object 
within the error region is 4C -1-36.22 a Seyfert 1 at z = 0.709 
(lEracleous & HalpemI 120041) . Seyfert 1 galaxies have b een re - 
cently shown as potential y-ray emitters Abdoet al. (2009ah . 
Close follow-up observations of both objects will be needed to 
determine the actual counterpart. 

IFGL J1323.1+2942.- 4C -h29.48 is the association selected 
by the /T-means algorithm. The latter corresponds to a multi- 
component radio object. There is very little information about 
its corresponding optical counterpart. This same source w as first 
suggested as a possible affiliation by Abdo et alj (1201 Obi) . 
IFGL J1642.5+3947.- /T-means reveals 3C 345 as the fikely 
association. The SDSS spectrum of this object confirms it as 
an FSRQ at z = 0.588 (Fig. |5]). The actual radio counterpart 
flashes the brightest FIRST flux density among all IFGL associ- 
ated sources in the 'overl ap region'. T his object was also noted 
as a possible affiliation bv lAbdo et al. (2010b). 
IFGL J1649.6+5241.- FIRST J 164924.9+5235 15 was the pick 
by the /iT-means algorithm. No SDSS optical spectrum is avail- 
able for the corresponding optical counterpart. The sou rce was 
recognised as a likely variable by Gregorv et al. ("2001"). It was 
originally tagged as a tentative affiliation by Abdo et al. (2 010bi) . 
Optical spectroscopy is needed for conclusive typing. 

A.2. Unassociated 

IFGL J0736.4+4053.-The brightest FIRST radio source within 
the error region is FIRST J073655.0H-405351 (S 1.4 = 2.34 mJy). 
The latter is listed as a radio galaxy at z = 0.352 according to 
SDSS measurements. The only other radio source within the 
99.7% eiTor region is FIRST J073614.4-H405326 (5 1.4 = 0.82 
mJy) that reveals a blank optical field to the SDSS limit of 
r > 23.1. FIRST J073610.4-h405940(5 1.4 = 18.85 mJy) lies just 
outside the 99.7% error region and shows a steep spectral in- 
dex Q'92 = -0.94 between WENSS (326 MHz) and FIRST (1.4 
GHz). 

IFGL J0900.5+3410.- The brightest radio source within the 
99.7% eiTor region FIRST J085917.2+340908 (5 1.4 = 55.51 
mJy) has been identified as a galaxy at z = 0.55 by 
iHook et all (11998). The second brightest radio emitter FIRST 
J09005 1.4-1-342440 (5 1.4 = 19.64 mJy) is a rather steep a^ = 
-1.24 multi-component source. Other multi-component radio 
sources lie in the vicinity of the error region. The only contained 
ROSAT X-ray source IRXS J090003.5+340905 coiTesponds to 
a radio-quiet QSO (z = 1.937) typed by the SDSS pipeline. 
Another X-ray source catalogued by XMM-Newton XMMSLl 
J090046.0-H335422 has been classified as a radio-quiet QSO at 
z - 0.228 from the SDSS spectroscopic pipeline. 
IFGL J0942.1-I-4313.- This is the faintest unassociated 
Fermi source in the 'overlap region'. Only two FIRST 
sources lie within its error region. One is a galaxy: FIRST 
J094136.4-H431752 (z = 0.15). The other radio source FIRST 
J094230.5-H430920 is detected with a flux density of 3.38 mJy 
at 1.4 GHz but lacks an optical SDSS counterpart to a limit of 
r>23.1. 

IFGL J1226.0-I-2954.- The brightest radio source FIRST 
J122542.2+295616 (5 1.4 = 13.78 mJy) was automatically se- 
lected by K-means as a possible association. However, it was 
later manually discarded as a possible 'impostor' since its optical 
counterpart is listed as extended by the SDSS pipeline. Analysis 



of the Fermi LAT data shows no high-energy photons with ener- 
gies above 10 GeV. 

IFGL J1515.5+5448.- In this case, the brightest radio source 
FIRST J151603.0+545629 {S 1.4 = 26.89 mJy) is steep and faUs 
in the neighbourhood of a SDSS optical object flagged as ex- 
tended. FIRST J151444.1+545027(5i.4 =4.91 mJy) should also 
be examined as a potential counterpart. 

IFGL J1527.6-I-4152.- At 5 1.4 = 77.77 mJy, FIRST 
J152757. 5+414708 is the dominant radio source within the y- 
ray error region. It is also detected by WENSS and GB6, but falls 
outside the locus of association determined by K-means. Further 
analysis shows that its spectral index is rather steep CK92 = -1 .04. 
FIRST J152735.2+414839 (S 1.4 = 2.93 mJy) is most likely as- 
sociated with a pair of interacting galaxies at z = 0.149. 
IFGL J1553.9+4952.- Three prominent ROSAT sources are 
found in or around this region: IRXS J155357. 1+495930 (with 
a photometi-ic redshift z = 0.425), IRXS J155437.5+495915 
(QSO, z = 0.905), and IRXS J155254.9+495818 (no spec- 
troscopy). However, none is radio loud. The brightest radio 
source inside FIRST J155234.8+495446 (5 1.4 = 7.72 mJy) re- 
veals a steep radio spectrum Q'92 - -0.98 and no obvious optical 
counterpart. IFGL J1553. 9+4952 has the highest energy y-ray 
photon (E - 265.4 GeV) detected among the unassociated Fermi 
sources in the 'overlap region'. 

IFGL J1627.6+3218.- FIRST J162715. 3+321652 (5 1.4 = 
14.68 mJy) is steep in radio and remains undetected in the opti- 
cal by SDSS down to a limit of r > 23.1. We find no Fermi LAT 
photons with energies above 10 GeV within the error region. 
Interestingly, there is a single Fermi LAT photon with energy E 
- 10.6 GeV that falls outside the error region but may coincide 
with the radio-quiet ROSAT source IRXS J162851.9+322655. 
The latter is a rather soft X-ray object with a hardness radio of 
-0.83 that deserves some attention. 

IFGL J1630.5+3735.- Inside the error region, we find FIRST 
1163056.1+374227(5 1.4 = 6.09 mJy) that shows a steep spectral 
index Q'92 = -1.12. There is an extended SDSS optical counter- 
part in the vicinity. The Fermi LAT data for this source is void 
of individual photons with energies above 10 GeV. 



